Abstract
Identifying dense bipartite subgraphs is a common graph data mining task. Many applications focus on the enumeration of all maximal bicliques (MBs), though sometimes the stricter variant of maximal induced bicliques (MIBs) is of interest. Recent work of Kloster et al. introduced a MIB-enumeration approach designed for “near-bipartite” graphs, where the runtime is parameterized by the size k of an odd cycle transversal (OCT), a vertex set whose deletion results in a bipartite graph. Their algorithm was shown to outperform the previously best known algorithm even when k was logarithmic in |V|. In this paper, we introduce two new algorithms optimized for near-bipartite graphs - one which enumerates MIBs in time \(O(M_I |V| |E| k)\), and another based on the approach of Alexe et al. which enumerates MBs in time \(O(M_B |V| |E| k)\), where \(M_I\) and \(M_B\) denote the number of MIBs and MBs in the graph, respectively. We implement all of our algorithms in open-source C++ code and experimentally verify that the OCT-based approaches are faster in practice than the previously existing algorithms on graphs with a wide variety of sizes, densities, and OCT decompositions.
This work was supported by the Gordon & Betty Moore Foundation’s Data-Driven Discovery Initiative under Grant GBMF4560 to Blair D. Sullivan and the NC State College of Engineering REU program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, P., Alon, N., Aronov, B., Suri, S.: Can visibility graphs be represented compactly? Discret. Comput. Geom. 12, 347–365 (1994)
Akiba, T., Iwata, Y.: Branch-and-reduce exponential/FPT algorithms in practice: a case study of vertex cover. Theoret. Comput. Sci. 609, 211–225 (2016)
Alexe, G., Alexe, S., Crama, Y., Foldes, S., Hammer, P., Simeone, B.: Consensus algorithms for the generation of all maximal bicliques. Discret. Appl. Math. 145, 11–21 (2004)
Dawande, M., Keskinocak, P., Swaminathan, J., Tayur, S.: On bipartite and multipartite clique problems. J. Algorithms 41, 388–403 (2001)
Dias, V., De Figueiredo, C., Szwarcfiter, J.: Generating bicliques of a graph in lexicographic order. Theoret. Comput. Sci. 337, 240–248 (2005)
Eppstein, D.: Arboricity and bipartite subgraph listing algorithms. Inf. Process. Lett. 51, 207–211 (1994)
Garey, M., Johnson, D.: Computers and Intractability: A Guide to NP-Completeness. Freeman, San Fransisco (1979)
Gély, A., Nourine, L., Sadi, B.: Enumeration aspects of maximal cliques and bicliques. Discret. Appl. Math. 157(7), 1447–1459 (2009)
Goodrich, T., Horton, E., Sullivan, B.: Practical graph bipartization with applications in near-term quantum computing,. arXiv preprint arXiv:1805.01041, 2018
Gülpinar, N., Gutin, G., Mitra, G., Zverovitch, A.: Extracting pure network submatrices in linear programs using signed graphs. Discret. Appl. Math. 137, 359–372 (2004)
Horton, E., Kloster, K., Sullivan, B.D., van der Poel, A., Woodlief, T.: MI-bicliques: Version 2.0, August 2019. https://doi.org/10.5281/zenodo.3381532
Hüffner, F.: Algorithm engineering for optimal graph bipartization. In: Nikoletseas, S.E. (ed.) WEA 2005. LNCS, vol. 3503, pp. 240–252. Springer, Heidelberg (2005). https://doi.org/10.1007/11427186_22
Chang, W.: Maximal biclique enumeration, December 2004. http://genome.cs.iastate.edu/supertree/download/biclique/README.html
Iwata, Y., Oka, K., Yoshida, Y.: Linear-time FPT algorithms via network flow. In: SODA, pp. 1749–1761 (2014)
Kaytoue-Uberall, M., Duplessis, S., Napoli, A.: Using formal concept analysis for the extraction of groups of co-expressed genes. In: Le Thi, H.A., Bouvry, P., Pham Dinh, T. (eds.) MCO 2008. CCIS, vol. 14, pp. 439–449. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87477-5_47
Kaytoue, M., Kuznetsov, S., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181, 1989–2011 (2011)
Kloster, K., Sullivan, B., van der Poel, A.: Mining maximal induced bicliques using odd cycle transversals. In: Proceedings of the 2019 SIAM International Conference on Data Mining (2019, to appear)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Comput. Netw. 31, 1481–1493 (1999)
Kuznetsov, S.: On computing the size of a lattice and related decision problems. Order 18, 313–321 (2001)
Li, J., Liu, G., Li, H., Wong, L.: Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: a one-to-one correspondence and mining algorithms. IEEE Trans. Knowl. Data Eng. 19, 1625–1637 (2007)
Lokshtanov, D., Saurabh, S., Sikdar, S.: Simpler parameterized algorithm for OCT. In: Fiala, J., Kratochvíl, J., Miller, M. (eds.) IWOCA 2009. LNCS, vol. 5874, pp. 380–384. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10217-2_37
Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 260–272. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27810-8_23
Mushlin, R., Kershenbaum, A., Gallagher, S., Rebbeck, T.: A graph-theoretical approach for pattern discovery in epidemiological research. IBM Syst. J. 46, 135–149 (2007)
Panconesi, A., Sozio, M.: Fast hare: a fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS, vol. 3240, pp. 266–277. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30219-3_23
Peeters, R.: The maximum edge biclique problem is NP-complete. Discret. Appl. Math. 131, 651–654 (2003)
Sanderson, M., Driskell, A., Ree, R., Eulenstein, O., Langley, S.: Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Mol. Biol. Evol. 20, 1036–1042 (2003)
Schrook, J., McCaskey, A., Hamilton, K., Humble, T., Imam, N.: Recall performance for content-addressable memory using adiabatic quantum optimization. Entropy 19, 500 (2017)
Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput. 6, 505–517 (1977)
Wernicke, S.: On the algorithmic tractability of single nucleotide polymorphism (SNP) analysis and related problems (2014)
Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets. NATO Advanced Study Institutes Series (Series C– Mathematical and Physical Sciences), vol. 83, pp. 445–470. Springer, Dordrecht (1982). https://doi.org/10.1007/978-94-009-7798-3_15
Yannakakis, M.: Node-and edge-deletion NP-complete problems. In: STOC, pp. 253–264 (1978)
Zhang, Y., Phillips, C.A., Rogers, G.L., Baker, E.J., Chesler, E.J., Langston, M.A.: On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinform. 15, 110 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendices
A MIB-Enumeration Framework Subroutines
We now provide algorithmic details and proofs of the complexity and correctness of MakeIndMaximal and AddTo.
1.1 A.1 MakeIndMaximal
Recall that MakeIndMaximal takes in (C, S), where C is an induced biclique and \(S \subseteq V\), and either returns a MIB \(C^+\) where \(C \subseteq C^+\), \(C^+ \subseteq C \cup S\), \(C \ne \emptyset \), or returns \(\emptyset \). If it returns \(\emptyset \) and \(C \ne \emptyset \) then there is another MIB D which contains C and \(v \in (V \setminus S) \setminus C\). We give pseudo-code of MakeIndMaximal in Algorithm 3.
Lemma 5
MakeIndMaximal returns a MIB \(C^+\) where \(C \subseteq C^+\), \(C^+ \subseteq C \cup S\), \(C \ne \emptyset \), or returns \(\emptyset \).
Proof
Referring to the pseudo-code in Algorithm 3, it is clear that \(C \subseteq C^+\), as no vertices are ever removed from the input biclique C. Furthermore, the only vertices added to \(C^+\) are from S, so \(C^+ \subseteq C \cup S\) and \(C^+\) is the only biclique returned by MakeIndMaximal. Note that neither side of C is empty and the only vertices added are independent from the side of the biclique which they are added to, so if we do not return \(\emptyset \) the object returned is an induced biclique. If no node from outside of S can be added to \(C^+\), then we will not return \(\emptyset \) and thus \(C^+\) is maximal.
Lemma 6
If MakeIndMaximal returns \(\emptyset \) and \(C \ne \emptyset \) then there is another MIB D in G which contains C and \(v \in (V \setminus S) \setminus C\).
Proof
Note that \(C \subseteq C^* = C_1 \times C_2\) at line 12. As MakeIndMaximal returns \(\emptyset \) there must be a vertex \(v \in V_S = V \setminus (S \cup C^*)\) which can be added to \(C^*\). Let D be a MIB containing \(C^*\) and v, thus D suffices to prove the lemma.
Lemma 7
MakeIndMaximal runs in O(m) time.
Proof
Note that because G is connected, \(n \in O(m)\). Setting \(C_S\) and \(V_S\) can be done in O(n) time. In each for loop, we can scan all of the edges incident to each v in the iterated-over set and keep count of how many nodes from \(C_i\) have been seen (checking for inclusion can be done in O(1) time with an O(n) initialization step). Thus, each edge is scanned at most once per for loop.
1.2 A.2 AddTo
Recall that AddTo takes in (C, v) where \(C=C_1 \times C_2\) is an induced biclique and \(v \in V \setminus (C_1 \cup C_2)\), and returns the induced biclique where v is added to \(C_1\), N(v) is removed from \(C_1\), and \(\overline{N}(v)\) is removed from \(C_2\) if \(C_2 \setminus \overline{N}(v) \ne \emptyset \) and \(\emptyset \) otherwise. We give pseudo-code of AddTo in Algorithm 4.
Lemma 8
AddTo returns the induced biclique where v is added to \(C_1\), N(v) is removed from \(C_1\), and \(\overline{N}(v)\) is removed from \(C_2\) if \(C_2 \setminus \overline{N}(v) \ne \emptyset \), and \(\emptyset \) otherwise.
Proof
Referring to the pseudo-code in Algorithm 4, it is clear that v is added to \(C_1\) and N(v) is removed from \(C_1\). Additionally v’s non-neighbors are effectively removed from \(C_2\) by intersecting it with N(v). If \(C_2' = \emptyset \) then \(C_2 \setminus \overline{N}(v) = \emptyset \) and \(\emptyset \) is returned. Otherwise \(C_1' \ne \emptyset \) since it includes v and thus \(C_1' \times C_2'\) is a biclique. \(C_1' \times C_2'\) must be an induced biclique as \(C_2' \subseteq C_2\), \(C_1' \setminus \{v\} \subseteq C_1\), and \(C_1 \times C_2\) is an induced biclique and \((N(v) \cap C_1') = \emptyset \) by definition.
Lemma 9
AddTo runs in O(m) time.
Proof
Note that because G is connected, \(n \in O(m)\). AddTo can be completed by scanning all of v’s O(m) incident edges in tandem with an O(n) preprocessing step to allow for constant-time look-ups when checking for inclusion in a set.
B MB-Enumeration Framework Subroutines
We give a detailed description of the MakeMaximal and Consensus subroutines used in OCT-MICA, along with arguments of their correctness and complexity.
1.1 B.1 MakeMaximal
Extending a biclique to be maximal is different in the non-induced case from the induced case, since MBs are completely characterized by one side of the biclique.
Lemma 10
MakeMaximal runs in O(m) time.
Proof
In order to form \(X^*\), we can scan the edges incident to each \(v \in Y\) and keep count of how many nodes from \(X^*\) have been seen (checking for inclusion can be done in O(1) time with an O(n) initialization step). The same can be done for \(Y^*\), where instead we scan the edges incident to each \(v \in X^*\). Thus, each edge is scanned at most twice in MakeMaximal.
1.2 B.2 Consensus
The MICA section of OCT-MICA relies heavily on the Consensus operation introduced in [3] for finding new candidate bicliques. For each pair of bicliques, there are four candidate bicliques which form the consensus of the pair. Note that any of the four candidates may be empty and if so discarded. Consensus runs in O(n) time using standard techniques for set union and intersection.
C Additional Enumeration Experiments
Here we include figures corresponding to additional experimental results of our initial benchmarking and on the computation biology data from [29] described in Sects. 5.2 and 5.4 respectively (Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 and Tables 2, 3).
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sullivan, B.D., van der Poel, A., Woodlief, T. (2019). Faster Biclique Mining in Near-Bipartite Graphs. In: Kotsireas, I., Pardalos, P., Parsopoulos, K., Souravlias, D., Tsokas, A. (eds) Analysis of Experimental Algorithms. SEA 2019. Lecture Notes in Computer Science(), vol 11544. Springer, Cham. https://doi.org/10.1007/978-3-030-34029-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-34029-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34028-5
Online ISBN: 978-3-030-34029-2
eBook Packages: Computer ScienceComputer Science (R0)